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Parameter Sensitivity Analysis of the 
Energy/Frequency Convexity Rule 
for Nanometer-scale Application Processors 

Karel DeVogeleer, Gerard Memmi, and Pierre Jouvelot 


Abstract —Both theoretical and experimental evidence are presented in this work in order to validate the existence of an 
Energy/Frequency Convexity Rule, which relates energy consumption and microprocessor frequency for nanometer-scale 
microprocessors. Data gathered during several month-long experimental acquisition campaigns, supported by several independent 
publications, suggest that energy consumed is indeed depending on the microprocessor’s clock frequency, and, more interestingly, the 
curve exhibits a clear minimum over the processor’s frequency range. An analytical model for this behavior is presented and motivated, 
which fits well with the experimental data. A parameter sensitivity analysis shows how parameters affect the energy minimum in the 
clock frequency space. The conditions are discussed under which this convexity rule can be exploited, and when other methods are 
more effective, with the aim of improving the computer system’s energy management efficiency. We show that the power requirements 
of the computer system, besides the microprocessor, and the overhead affect the location of the energy minimum the most. The 
sensitivity analysis of the Energy/Frequency Convexity Rule puts forward a number of simple guidelines especially for by low-power 
systems, such as battery-powered and embedded systems, and less likely by high-performance computer systems. 

Index Terms —DVFS, energy optimization, Energy/Frequency Convexity Rule, SoC. 

- > - 


1 Introduction 

T he execution time characteristics and power require¬ 
ments of a code sequence are the main drivers that 
define its final energy consumption. This is a direct result 
of the definition of electrical energy consumption: the in¬ 
tegral of electrical power over time. The execution time 
is influenced by the type and the amount of operations 
contained by the code sequence of concern. For example 
register-based operations will require less energy to execute 
compared to external memory-based instructions. As such, 
each functional unit within a microprocessor and, more 
generally, each component of the computer system have 
their own respective power and execution time profiles. As 
a result, every code sequence has different power and exe¬ 
cution time demands. For example, Carroll and Reiser Q 
showed that, for an embedded system running equake, 
vpr, and gzip from the SPEC CPU2000 benchmark suite, 
the microprocessor energy consumption exceeds the RAM 
memory consumption, whereas crafty and mcf from the 
same suite showed to be straining more energy from the 
device RAM memory. 

A property of a code sequence's energy consumption is 
that, under certain assumptions, it shows convex properties, 
which is henceforth referred to as the Energy/Frequency 
Convexity Rule Q. The rule states that there exists an 
optimum clock frequency for the execution of each sequence 
of code that minimizes the energy consumption of that 
code sequence. Under certain conditions this optimal clock 
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frequency, minimizing energy consumption, lies between 
the minimum and maximum clock frequency. The existence 
of a minimum energy point results from the behavior of 
the microprocessor's power and the execution time w.r.t. 
the clock frequency. The microprocessor's power increases 
about linearly with clock frequency, meaning that more 
energy is consumed when the microprocessor's speed is 
increased. On the other hand, the slower the clock fre¬ 
quency, the longer execution time will increase the energy 
expenditure. As will be shown, running at the optimal clock 
frequency is a trade-off between performance, in terms of 
execution time, and energy savings. For applications requir¬ 
ing human interaction, it has been shown that the clock fre¬ 
quency can be scaled down considerably without affecting 
user's experience j^. In this paper, experimental evidence is 
presented, supported by several independent publications, 
for the existence of an Energy/Frequency Convexity Rule 
that relates energy consumption and microprocessor clock 
frequency on mobile devices. This convexity property seems 
to ensure the existence of an optimal frequency where en¬ 
ergy consumption is minimal. This existence claim is based 
on both theoretical and practical evidence on a Systems- 
on-Chip (SoC). Data gathered via acquisition campaigns on 
multiple platforms suggest that the energy consumed per in¬ 
put element is strongly correlated with microprocessor clock 
frequency and, more interestingly, that the corresponding 
curve exhibits a clear minimum over a frequency window 
specific to the computer system. An analytical model of this 
behavior is also motivated, which fits well with the experi¬ 
mental data. A parameter sensitivity analysis is carried out 
to assess the influence of the parameters on the optimal fre¬ 
quency minimizing energy consumption. This optimal fre¬ 
quency is shown to increase when the power requirements 
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of the computer system, excluding the microprocessor's, 
increase. Clock cycles lost for routine maintenance of the 
system also force the optimal frequency up. The optimal 
frequency as derived from the theoretical framework is, 
however, independent of the number of instructions to be 
executed. 

In addition to a deeper theoretical and practical under¬ 
standing of a microprocessor's energy consumption and 
the Energy/Frequency Convexity Rule, this paper offers a 
new, in-depth, parameter sensitivity analysis compared to 
what was presented in De Vogeleer et al. Q. The main 
contributions of this paper are thus: 

• a theoretical framework for the Energy/Frequency 
Convexity Rule; 

• a sensitivity analysis of the Energy/Frequency Con¬ 
vexity Rule to estimate the impact of multiple input 
parameters; 

• an analysis of the Energy/Frequency Convexity Rule 
under special conditions, such as, out-of-order execu¬ 
tion (OOE) and absence of slack time; 

• supportive experimental data and a comprehensive 
survey of the state of the art. 

The rest of the paper is organized as follows. Section 
elaborates the Energy/Frequency Convexity Rule. Followed 
by the presentation of experimental results in Sectio n [3} a 
parameter sensitivity analysis is carried out in Section id An 
overview of the related work is presented in the state-of- 
the-art Section]^ Finally, Section lists the main conclusions 
drawn from our analysis supporting a better usage of the 
energy especially for embedded systems. 

2 Single-Core Convexity Model 

The energy consumption of a computer system comprising a 
microprocessor, and possibly other components, over a time 
interval At, is equal to the integral of its system's power 
usage over time: 

pAt pAt 

E,y,{At) = / Psys(t) dt = / I{t) . V{t) dt. (1) 

Jo Jo 

If the power is considered constant, the integral is equiv¬ 
alent to the product of the power consumption and the 
timespan of interest. V(t) can often be considered constant 
by design; for example, portable devices such as smart¬ 
phones are supplied by 3.7 V lithium-ion batteries, and 
microprocessors operate at very specific voltage levels. The 
current's time-dependent variance depends on the context, 
its history and the state of the microprocessor. However, 
at the time frame of an instruction execution, henceforth 
referred to as a time quanta, the energy consumption can be 
deemed quasi constant. Following this definition, the pa¬ 
rameters that define the energy consumption during a time 
quanta are also constant. As such, similar to the rationale 
behind the Riemann sum, the total energy consumption of a 
code sequence can be thought of as the sum of the energy 
consumption during each time quanta At: 

n n 

Esys ~ ^ ^ -^sys,i ~ ^ ^ Psys,i ' (7) 

i=l i=l 


where n is the number of time quanta. Ati is the time frame 
over which Psys,i is constant. Ati could be the length of 
one instruction execution or, when the power variance is 
negligibly small, Ati can be the length of an arbitrary-sized 
code sequence. One has At = Yl7=o 

The models for the power and execution time are 
developed separately in the next two subsections . A 
more profound expound of the models can be found in 
De Vogeleer Q. 

2.1 Power Model 

A computer system's power usage Pgys is the sum of three 
power components: 

1) ^cpu/ the microprocessor's power, 

2) Pdrop/ the system's power usage that is dependent 
or controllable by the microprocessor, and 

3) Pback/ the system's power that is independent of the 
microprocessor. 

^drop can be due to components that are put to sleep 
when the microprocessor doesn't need their functionality, 
e.g., audio codecs, camera circuits, or the radio interface. 
Pback constitutes components that require power indepen¬ 
dent from what the microprocessor is doing, e.g., memory 
refreshing in synchronous dynamic random access memory 
(SDRAM). Pback is however controllable. It is noted that the 
display of a hand-held device falls also under Pback as it is 
active when the user requires interaction with the device, 
not necessarily when the microprocessor is active. 

For the formulation of the microprocessor's power Pepu/ 
we combined the well know expression for an electronic 
circuit's power dissipation \olV^ f |^, referred to as the 
dynamic power, and the leakage current model of Skadron et 
al : 

Pcpu = (l + 7V^)•e/^^ (3) 

where 7 is a parameter describing the magnitude of the 
leakage currents due to capacitor-based circuits, V is the 
supply voltage and ^ is a parameter defining the power re¬ 
quirements of the microprocessor. It is known that the leak¬ 
age currents are temperature-dependent 0. Henceforth, 
however, we deem the temperature constant throughout our 
analysis. 

2.2 Execution Time Modei 

The execution time At of a code sequence, including slack 
time (3 and time thieves /k (the time spend by the operating 
system), can be modeled as: 

where ccb is the number of clock cycles dedicated to the 
execution of the user program's statements, /k the average 
number of clock cycles per time unit lost due to time thieves, 
and /3 the average amount of slack time per clock cycle. By 
definition / > /k since the system can't steal more clock 
cycles than what is available. 

Time thieves, represented by /k in Equation are clock 
cycles lost due to low-level operations. These time thieves 
have higher priority than ccb. Examples of /k are pipeline 
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stalls due to branch miss-predictions, misaligned memory 
accesses, page faults, operation interventions, interrupt han¬ 
dling, operating system routine tasks, etc. The slack time 
represented by ccb/d is the time the microprocessor cannot 
continue execution as it is waiting for external data, e.g., in 
the main memory due to cache misses. Slack time can be 
addressed with out-of-order execution (OOE), which would 
scale down P (See Section |43t . 


2.3 System’s Energy Consumption Model 

Inserting the power model and execution time model from 
Equation and respectively, into the definition of the 
system's energy consumption in quanta time i\ 


E. 


sys,2 


P -A/ 

-*■ sys,2 

{Pcpu,i -^dropji E Phack) ’ ^ti 

((1 + 7i^) ’ + ^drop,i + ^back) 


Here, Psys,i is a monotonic increasing function of /, whereas 
Ati is a monotonic decreasing function of /, given that 

{-fdrop,i: -fback? 7? Ci: fk-! P} ^ Note that -Pback dnd CC\) 

are scaling factors of E^sys,i and that this implies that the 
energy consumed during the execution of a piece of code is 
linearly dependent on its code complexity and background 
power demands. Moreover, this also implies that compiler 
optimization techniques that target code size optimization 
will directly also lead to an improved energy profile of 
the code On the other hand, a microprocessor can also 
reduce energy consumption by parallelizing code execution, 
increasing power demands but reducing execution time. 
Similar observations between the interaction of energy and 
power consumption were made by Valluri and John j^. 

At this stage, we only apparently observe an hyper¬ 
bolic relation between energy and frequency. We have to 
take into account the relationship between the voltage 
and the frequency to find a convex analytical relationship 
between E and /. Such convexity is of interest as there 
would exist a microprocessor configuration that minimizes 
the energy consumption for that particular combination of 
{Pdrop,i^ -fback^ 7? /k? 


2.4 Voltage/Frequency Relationship 

The following derivation regarding the energy/frequency 
relationship is similar to Yuki and Rajopadhye |[^|; how¬ 
ever, different frequency and voltage relationships are used, 
mainly more contemporary, and the leakage current is 
scaled more realistically. Note that Pback and Pdrop can be 
arbitrarily large; their values are inherent to the computer 
system and independent of the microprocessor. In the re¬ 
mainder of this work it is also assumed that the temperature 
of the microprocessor remains constant unless otherwise 
noted. In practice it was shown by DeVogeleer Q that 
the microprocessor's power requirements show a strong 
exponential relation with the temperature. The non-linear 
temperature effects complicate the microprocessor's tem¬ 
poral power demands considerably. The temperature has, 
however, a small impact on the convex behavior of the 
Energy/Erequency Convexity Rule Q. Therefore, we omit 



Fig. 1. Frequency/voltage relationships of multiple modern and vintage 
application microprocessors, as found in the Linux kernel. Two dashed 
linear curves are drawn: V = mi/+m 2 ; the red is fitted on the depicted 
data of the three Exynos microprocessors; the blue is borrowed from 
Yuki and Rajopadhyes [To] . 


the temperature effects on the Energy/Erequency Convexity 
Rule further on. 

Eor modern microprocessors, the frequency / and sup¬ 
ply voltage V are approximately linearly related as shown 
in Eigure It is to be noted that the S3C6410 and the 
PXA320 are fairly outdated microprocessors and their low 
performance is visible; the Exynos series and the Intel M 
are more recent microprocessors designed for embedded 
multimedia applications, e.g., smartphones and tablets. The 
exact relationship between the voltage and frequency is 
dependent on the physical abilities of the microprocessor's 
internals, but also on the capability of the microprocessor's 
voltage and frequency regulator to scale the voltage and 
frequency on-demand. When the frequency of a micropro¬ 
cessor is ramped up, the transistors inside need to switch 
faster to meet timing and delay constraints. As subparts of 
transistors are essentially very small capacitors as well; a 
finite time is required to switch the transistor from one state 
to another. Thus if stringent timing delays need to be met, 
the microprocessor voltage needs be increased accordingly. 
The higher voltage supply will decrease the transistors' 
transition time and capacitors' charging time. This translates 
in a positive slope of the frequency/voltage relationship. 

An affine transformation between voltage and frequency 
is expressed as follows: 


V = mifEm2, (6) 

where mi and m 2 are positive regression coefficients. Eig¬ 
ure shows the voltage and frequency relationship for 
several microprocessors. The values mi = | and m 2 = ^, 
for the dashed blue line in Eigure are motivated to 
be adequate for high-performance microprocessors based 
on theoretical values |[^. Here, the values mi = ^ and 
m 2 = I are shown to better represent the voltage/frequency 
relationship for microprocessors for embedded applications. 









4 


These values are approximates of a linear fit on the com¬ 
bined data of the Exynos microprocessors. 

Henceforth, the microprocessor's default clock frequency 
window (Tcpu) is defined as the clock frequency range 
bounded by the minimum and maximum clock frequency 
of the microprocessor: 

^ 3pu ~ /min ^ /cpu ^ /max- (7) 

We have seen in Section |2.2| that / < /k- Hence, the 
exploitable clock frequency window (J^epx) is defined as the 
frequency range with an upper bound characterized by 
the microprocessor's maximum frequency /max/ and the 
lower bound defined by the largest of the microprocessor's 
minimum frequency /min and /ki 

^ 3px — I^ax(/min5 /k) ^ /cpu ^ /max- (8) 

It is the exploitable clock frequency window that is open for 
energy optimization via clock frequency scaling. 


2.5 Optimal Microprocessor Clock Frequency /opt 

The power model independent of V is obtained by inserting 
Equation]^ in the definition of Pcpu- 

Pcpu = (l + 7V)^fV^ 

= af + bf + cf + df, (9) 


where a = 7 ^mf, b = + 377712 ), c = 77717772^(377772 + 

2), and d = 7772^(77772 + 1)- This power formulation can 
then be inserted in the energy consumption model Pgys of 
Equation]^ 

Eor further analysis the normalized energy consumption 
Pn for code size and background power-independent anal¬ 
ysis is introduced. The normalized energy consumption is 
defined as 

En = ~ -PbackA^ ^ 

CCb 


Normalizing the energy consumption Pgys has no effect 
whatsoever on its tentative convex properties as ccb and 
Pback merely induce an affine transformation of Pn without 
rotation. Pback has an effect on the convex properties. Pback 
should however not be part Pn as this power component 
will be present in the system regardless of what the mi¬ 
croprocessor is doing. As a consequence, Pback should not 
influence optimal operating settings of the microprocessor. 

The energy function in Equation 10 is called strictly 
convex over the exploitable clock frequency window if and 
only if (iff) 


V/i 7 ^ /2 ^ Pcpu, Vt G (0,1) : 

En{tfl + (1 - t)f2) < tE^ifl) + (1 - ^)Pn(/2)- (H) 


In other words, if Pgys is strictly convex, then Pgys possesses 
no more than one minimum in the exploitable frequency 
window. If the minimum of Pgys is not within the micropro¬ 
cessor's boundaries, then the minimum /opt can be found 
via the first derivative of Pn, while its second derivative 
must remain positive: 


dE, 

df Jf= 


= 0 and 


/—/op 


dp 


> 0 . 


( 12 ) 


To simplify the derivative calculation for Equation]^ Pn 
is split into a polynomial and non-polynomial part, namely 
P^ and P^: 

E^ = E^ + E^ 


E^ = (ap + bf + cf + df + Pdpop.i) • d (13a) 


hi 

II 

{af Pbf P cf + d/ + Pdropd) ' j 

(13b) 

The respective derivatives are then as follows: 


dE^ 

df 

= {4a f + 3b f + 2cf + d)-/3 

(14a) 

dE^ 

_ 3a f + (26 - 4a/k) f + {c- 36/k) f 

df 

if - hf 


d^E^ 

dp 

2c/k/ + Eback + c/k 

if - hf 

(14b) 

= {12a f + 6bf + 2c) ■ (3 

(14c) 

d^E^ 

6a f + (26 - 16a/k) f + (I2a/^ 

- 66/k) P 


df if - hf 


_l_ 66/7 + 2 (Pback + cf + dp) 

if-hf 

These equations will be used further on in Section on 
parameters sensitivity analyses and are also the base for the 
next section's approximate solutions. 

Convex properties can be observed for Pn. Eor / —/k, 
P^ will approach /3Pdrop,i^ whereas P^ is amplified, and 
tends to positive infinity because of the presence of / — /k 
in the denominator. When ^ < /k, the system is spending 
more energy in overhead than in the actual program, as the 
overhead has priority over the program. In the limit, Pn 
goes to infinity at /k. At this point the system is overloaded 
and is not reactive anymore from the point of view of ccb. 
Eor / ^ 00 , it is P^ that inflates whereas P^ approaches 
zero. In other words, for the smaller clock frequencies, by 
virtue of the increased execution time, more energy due to 
leakage currents needs to be accounted for. The execution 
time for large frequencies are dramatically lower, but the dy¬ 
namic power consumption of the microprocessor increases 
cubically and the leakage currents increase quartically with 
clock frequency. As a result, the convex minimum of the 
energy function, at the optimal frequency /opt/ is the point 
where a balance is found between the consequences of the 
inflated execution time and the total power demands of the 
microprocessor. 

Given an energy/frequency convex behavior, three 
classes of microprocessor configurations can be distin¬ 
guished, as shown in Eigure When the optimal clock 
frequency /opt is left of the default clock frequency window 
(/opt < /min)/ setting the clock frequency at /min yields 
the best energy gainst if max(/min, /k) /opt /max 
then chasing /opt will earn the best energy efficiency; and 
when /opt > /max/ then the race-to-hal|^ energy opti¬ 
mization technique is shown to be most effective. It was 
noted by Rizvandi ||Tl] that under certain circumstances 

1. The energy optimization technique race-to-halt runs the micropro¬ 
cessor at full speed until all tasks are completed; then the microproces¬ 
sor is put in a low-power mode. 
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Fig. 2. The location of the optimal frequency /opt w.r.t. default clock frequency window (blue) is an indication of which energy optimization technique 
is most effective: (a) when /opt is left of the exploitable clock frequency window (/opt < /min), one should set the clock frequency as low as 
possible; (b) if max(/min,/k) < /opt < /max then chasing /opt will yield the best energy efficiency; (c) when /opt > /max, then the race-to-halt 
energy optimization technique is most effective. Powerful microprocessors are most likely to fall in the category (c), e.g. DGEMM 8C in Figure[7c| 
whereas low-power microcomputers are more likely to be in category (b), e.g. Tl C62 in Figure[7f| 


it can be more efficient, in terms of energy consumption, 
to have a binary frequency scheme, including the maxi¬ 
mum and minimum clock frequency, rather than scaling the 
clock frequency through the whole frequency space. The 
presented performance-oriented work, and also the user- 
oriented work of Seeker et al. suggest that this is, in 
fact, not the case, /opt nray assume any frequency within 
the default clock frequency window, and may fluctuate 
throughout the code execution depending on the kind of 
operations scheduled. 

2.6 Approximate Optimal Clock Frequency /opt 

The power model (Equation]^ in the energy consumption 
formulation of Equation is of the fourth order. When the 
fourth-order power equation can be adequately approxi¬ 
mated with a quadratic polynomial, the derivations can be 
simplified somewhat. The power consumption Pgys of the 
system can then be represented as: 

-Psys = af^ + bf + cf + df ^ kf + lf + m, (15) 

and accordingly the energy consumption of the system 
becomes 

= {kf‘^ + // + m + Pdrop,i) • (16) 

k G mJ, though {/, m} G M. The first and second derivatives 
of the normalized energy consumption are then as follows: 


The solution to Equation]^ is the frequency that minimizes 
energy consumption. Via Eerarri's solution [ [T^ for the cal¬ 
culation of the roots of a third order polynomial, the optimal 
frequency can be determined analytically. Yet, the analytical 
formulation to calculate the roots of a cubic polynomial is 
still elaborate. Let's assume some further simplifications. 
Eor /3 = 0, one gets that 

/2fc2/2 + 2k{m + Pdrop.i + /kO 

/opt = h + ^ -^^- (19) 

n ^ r)/k^“*”/'‘^”*"™“*” -f^drop,i 

- if-hr 

If all parameters are elements of IR+, the latter inequality 
holds whenever /k < /• Additionally, for /k = 0, one 
obtains 

n /m + Pdrop,i 

/opt = y—^— (20) 

^ 2(?71 + Pdropji) 

- 73 ’ 

which is only valid for — Pdrop,i < These simplified 
models for = /k = 0 may be used when the context allows 
for, i.e., when ccb is executed without any interruption. Eor 
example, from practical experience and in the literature, /k 
is often observed to be close to zero in a multi-core context. {3 
may vary considerably for different applications and should 
be assessed before deeming insignificant. 


9 Pn /Q/07 ^ I 1 \ /k( 2 A^/ + /) - /i^/^ + m + Pdrop,i 

-W ' ''--• 

(17a) 


_ 2{gk+J^ “1“ TTl Pdropji) 


(17b) 


3 Experimental Results 

In this section experimentally-obtained power and execu¬ 
tion time measurement traces are presented and used as a 
reference to study the Energy/Erequency Convexity Rule in 
the next section. 


There exists a convex minimum if has a root and 
is a monotonous increasing function. In other words: 


0 = 

2kp > 


2kl3f + ik + Pil - 4/kfc))/2 + 2/k/3(/kfc - l)f 
-2/kfc/ - (m + Pdrop.i + /k^(l - /3/k)) (18) 

r)/k^ 5“ /k^ Til Pdropd 

(/ - hr ■ 


3.1 Platform and Benchmark Description 

A Samsung Galaxy S2, sporting an ARM Cortex A9 dual¬ 
core microprocessor, was used as testbed. The A9 uses 
clock frequency ranges from 0.2 GHz to 1.6 GHz in steps of 
100 MHz. The Gold-Rader implementation of the bit-reverse 
algorithm was used as benchmark; it is part of the ubiqui¬ 
tous East Eourier Transformation (EFT) algorithm, in which 
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TABLE 1 

Benchmark execution time model parameters: ^7 and Psys as per 
Equation]^ and ccb, h, (3 as per Equationj^for running the 
Gold-Rader a^orithm on the A9 microprocessor. These values were 
used for the fitted models in Figure[3b| 




Gold-Rader - 

INPUT SIZE 2^^ - A9 

N 

6 

8 

10 

12 

14 

16 

CCb 

1.943 

8.596 

31.1 

144.359 

670.8 

2918.837 

/k 

0.134 

0.129 

0.137 

0.13 

0.13 

0.129 

/9 

-0.166 

-0.167 

-0.152 

-0.202 

-0.183 

-0.182 

e 

0.101 

0.108 

0.134 

0.137 

0.44 

0.011 

7 

5.578 

5.127 

4.030 

4.36 

1.035 

65.985 

Psys 

0.480 

0.480 

0.477 

0.469 

0.394 

0.407 


it rearranges deterministically elements in an array. Besides 
the Gold-Rader algorithm, the BEEBS benchmark (13) was 
also run on an ODROID XU+E, featuring an Exynos 5240, 
while the execution time and power was measured. The 
measurement data of the Gold-Rader algorithm and BEEBS 
show large similarities. The Gold-Rader algorithm is chosen 
as a base for the expound in the sequel. More info on the 
BEEBS measurements can be found in De Vogeleer (^. 


3.2 Execution Time and Power Consumption 

Eigure shows the execution time of the Gold-Rader 
algorithm on the A9 microprocessor. Tablej^shows the fitted 
execution time parameters as per Equation The fitted 
execution time model has a relative error such that 90% 
of the errors are between 0.18% and 7.36% and shows a 
median of 3.12 % for the execution time traces. 

Eigure shows the power profile of the Gold-Rader 
algorithm on the A9. All traces were recorded while the 
temperature of the hardware fluctuated. During the record¬ 
ing of the power traces the temperature of the testbed 
was artificially oscillated around 37° C and then the power 
samples at a temperature of 37°C were selected. 

Table also shows the fitted values for 7 and Pgys as 
per Equationj^for the A9. Discrete voltage/frequency pairs 
were used to fit the measured data as reported in Eigure 
for the Exynos 4210. 

The fitted model parameters in Table [^seem to be consis¬ 
tent for an input size up to 2^^. The fitted model parameters 
for larger input sizes seem to be much different. Note that 
array sizes up to 2^ fit in the LI cache, while sizes over 2^^ 
are too big to fit in the L2 cache. Therefore external memory 
accesses and microprocessor slack time may influence the 
power of the microprocessor. Overall, the power variation 
of the different input sizes are not as large as what was 
observed for the case of the execution time. The magnitude 
of the power of all traces are all of the same order, whereas 
for the execution time it may differ by multiple orders. 

As observed from Eigure the power model fits well 
on the experimental data. The fitting errors for the A9 are 
between 0.07% and 3.18% with a median of 0.86%. The 
fitted model for the A9 in Eigure [3b| for / = 1.5 GHz seems 
to deviate persistently from the measured data. This could 
be due to a slightly higher supply voltage at 1.5 GHz than 
reported in Eigure]^ for the Exynos 4210 microprocessor. 


3.3 Energy Consumption 

The estimated experimental energy consumptions are ob¬ 
tained by multiplying the power traces with the execution 
time traces for each frequency. This was done for both the 
experimental traces and the fitted power and execution time 
models. Eigure shows the energy consumption of the 
Gold-Rader algorithm on the A9 microprocessor. The fitted 
errors are the sum of the errors of the power and execution 
time traces separately. Eor the A9 traces a clear minimum 
energy consumption is observed between 500 MHz and 
800 MHz. 

4 Sensitivity of the Convexity Model 

To analyze the behavior and parameter sensitivity of the 
convexity model of Equation the Cortex A9 processor 
of the Exynos 4210 is used as reference use case, represen¬ 
tative for embedded multimedia applications, e.g., smart¬ 
phones j^. The following values were used, based on the 
measurements presented in the previous section: mi = 0.330 
[V/f], m 2 = 0.808 [V], ^ = 0 [s], 7 = 3.137 [V-^], A 
= 0.130 [GHz], Cmax = 0.181 [W/(GHz.V 2)], emin= 0.155 
[W/(GHz-V^)], Pdrop = 0 [W]. The microprocessor's clock 
frequency starts at 200 MHz and goes to 1.6 GHz and ^ is a 
parameter that describes the power profile of an application. 
The values for / 3 , /k, 7 and ^ were defined via fitting as 
presented in the previous sections. The microprocessor's 
clock frequency is also considered a continuous variable 
from here on. In reality the clock frequency is limited to a 
discrete set of values. However, for analytical purposes, not 
to mention the aesthetics of the graphs, the clock frequency 
is deemed continuous. 

In the next sections we will look at how time thieves 
and OOE impacts the convexity model. Time thieves are 
basically clock cycles lost to overhead, whereas OOE is an 
intelligent instruction execution scheme to minimize execu¬ 
tion slack time. 

4.1 What About Those Time Thieves? 

When considering the execution time of a code sequence, 
/k was previously defined as the number of clock cycles 
per time unit not available to the execution of the user 
code. These clock cycles are spent, for example, to handle 
microprocessor exceptions, or to execute operating system 
routine tasks, /k can therefore be regarded as little time 
thieves. Erom a mathematical point of view, the presence 
of /k in Equation also introduces some complexity for 
derivations such as Equation Bear in mind that the 
microprocessor's clock frequency / is always larger than /k; 
otherwise the execution time is not defined. Consequently, 
fk < /max must be satisfied. 

Eigure shows the sensitivity of /k with regards to the 
optimal frequency /opt, the microprocessor power (Pcpu oc 
0 , and the background power Pback- In the bottom plot it is 
seen that /opt(/k = 0, Pback = 0.5) « 0.8 GHz. The optimal 
frequency increases for increasing values of /k and hits 
the microprocessor's maximum frequency /max = 1.6 GHz 
around /k = 0.7GHz. At this point, about 45% («0.7/1.6) 
of the clock cycles would not be available to the code 
sequence. Eurthermore, it is observed that /opt > fk always 
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(a) execution time measurement (b) power measurement 



Fig. 3. Experimental data for the Cortex A9 microprocessor. The energy consumption of the benchmarks with different input sizes is shown for the 
Gold-Rader algorithm. The solid lines represent the measured data whereas the dotted lines is the product of the fitted power and execution time 
models from Figure|^and Figure[^ respectively. 


holds. The effect of the microprocessor's power demands 
on /opt is fairly small, expressed by the ^ parameter. A 
30 MHz to 50 MHz difference in /opt is observed between 
the minimum and maximum microprocessor's power usa^e 
as ^ varies between 0.155 and 0.181 (see Figure 

The background power usage Pback has a bigger impact 
on /opt than For Pback = 0, /opt even drops below 
the minimum operation frequency of the microprocessor. 
Increasing Pback inflates /opt- For /k = 0 and Pback « 2.5 W 
the optimal frequency already surpasses /max* For a typical 
value of /k (130 MHz), an increase in /opt is observed for 
increasing values of Pback; yet, the increase becomes smaller 
for larger values of Pback* The average difference between 
/opt(/k = 0) and /opt(/k = 0.13), within the microproces¬ 
sor's clock frequency range, is approximately 100 MHz. 

In the rest of this section it will be assumed for simplicity 
that /k ^ / unless otherwise stated. For a more realistic 


estimate of /opt, in case /k is not negligible, it was observed 
from the graphs that adding 100 MHz to /opt is a reasonable 
assumption. 


4.2 Absence of Time Thieves 


It is not unthinkable that, in particular contexts, /k is indeed 
negligibly small compared to /: /k <C /. For example, 
such occasions may occur when the clock frequency mi¬ 
croprocessor is reasonably fast, or the code sequence of 
concern is running only on one of the available cores of 
a multi-core microprocessor without interruption. Assum¬ 
ing /k negligible considerably simplifies Equation For 
max(/min,/k) < /opt < /max, Pn was Said to be strictly 
convex iff there exists only one point in the exploitable 
clock frequency window for which = 0 and > 0. 
Given the system of Equations these two requirements 











































(3) /opt(/k)C) 



(b) /opt(/k)-Pback) 

Fig. 4. Optimal microprocessor frequency /opt for variable levels of /k in 
function of on the top, and Pback> on the bottom. A typical value for /k 
is drawn at 0.13GHz (dashed vertical line). The area encapsulated by 
the dotted line signals the microprocessor’s exploitable clock frequency 
window: max(0.2 GHz, /k) < / < 1.6 GHz. 



Pback (W) 


(o) /opt (Pback?'C) 



(b) Pcpu/Pback ratio at /opt 

Fig. 5. Optimal microprocessor frequency /opt for variable background 
power consumption Pback- On the top /opt, is shown for various micro¬ 
processor loads On the bottom, the ratio between the background 
power and the microprocessor power Pcpu at /opt is shown. The area 
between the dotted lines signals the effective clock frequency window: 
0.2 GHz < / < 1.6 GHz. 


translate, respectively, into: 

= 4a/3/opt + 3(a + 6/3)/^% 

/ opt 

+ 2(6 + c/?)/opt + (<3/3 + c), (21) 

0 < 12a/3/opt + 6(a + 6/3)/opt 

+ 2(6 + c^) + 2^^. (22) 

Recall that for all constants in this system of equations: 
{a, 6, c, (i, G IR+. Thus the requirement in Equation 
is satisfied by default as the right-hand side will never be 
negative. Accordingly, the root requirement of Equation 
is also satisfiable. It is immediately clear that the back¬ 
ground power demands Pback directly controls the optimal 
frequency /opt - The constants {a, 6, c, d} describe the micro¬ 
processor's power usage whereas Pback describes the power 
demands of everything in the computer system besides the 


microprocessor. Eor systems with a large Pback/ e.g., servers 
or desktop computers, /opt will therefore be higher than for 
systems with a low Pback/ e.g., wireless sensors. Moreover, 
/opt may be so high that it is larger than the maximum 
microprocessor's clock frequency. 

Eigure shows the optimal frequency for a variable 
background power consumption Pback and microprocessor 
loads /. Also, the ratio between the microprocessor Pcpu 
and the background Pback power consumption is given. The 
area encapsulated by the dotted line signals the operating 
range of the microprocessor. Eor the microprocessor to be 
able to exploit the minimum-energy operation frequency, 
the background power consumption needs to be between 
0.02 W and about 2.75 W, depending on the exact micropro¬ 
cessor load. The influence of the different microprocessor 
loads on Pback is not significant; at 1.6 GHz there is a 0.5 W 
difference between Pback for ^min and ^niax- If -Pback is 
larger than 2.75 W, it is advised to run the microproces¬ 
sor at the maximum clock frequency to minimize energy 
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consumption. Under such conditions, the energy optimiza¬ 
tion technique known as race-to-halt is a good strategy. 
This was also Yuki and Rajopadhye's p0| main conclusion 
while studying high-performance computers. The optimal 
frequency /opt surpasses the microprocessor's maximum 
frequency roughly around the point where the background 
power demands become larger than the microprocessor's 
power usage. Battery-powered electronic systems such as 
embedded systems, wireless sensors or smartphones aim 
at minimizing their background power demands, which 
thus increases the feasibility of /opt exploitation. For more 
powerful computers, however, such as servers, the optimal 
frequency will be very likely out of reach of the micropro¬ 
cessor's capabilities: /opt > /max- For example, Seo et al. | [T4| 
claim that Dynamic Voltage and Frequency Scaling (DVFS) 
in general hardly improves the energy efficiency of mobile 
multimedia electronics. The testbed power measurements 
of their embedded system show, however, that their Pcpu 
to Pback ratio is smaller than 1 to 18, and their mi is very 
small. For their specific testbed, /opt is very likely larger 
than /max/ and race-to-halt should indeed be most benificial 
when aiming for energy savings. 


4.3 Out-of-Order Execution 

Out-of-order execution (OOF) is parametrized via (3 G [0, oo[ 
in Equation /3 = 0 when OOF is perfectly able to 
cover the time during external memory accesses with data- 
independent code execution; otherwise (3 is larger than 0. 
The system's normalized energy consumption, assuming 
/k ^ 0, is given by: 

En = (o^/^ + + cf^ 3- df 3- Pback) ' f ^ 


/ 

Its requirements for convexity are defined the same as for 
the case where time thieves are absent, given by Equation[2T] 
and 1^ It can be observed that for /I = 0 the most left- 


hand term in Equation 21 becomes zero, resulting in an 
increased /opt for the equality to be satisfied. Similarly, the 
larger the more /opt needs to decrease for the inequality 
of Equation]^ to hold. Eigure|^ shows the sensitivity of the 
(3 parameter on the optimal frequency /opt- Indeed, from 
the figure, it is observed that /opt decreases for increasing 
p. Moreover, /opt changes about 100 MHz over a 0 to 0.25 jas 
P range for medium levels of Pback- The larger Pback/ the 
larger the spread in /opt for variable p. Eor Pback over 4 W, 
the /opt spread between /3 = 0 and P = 0.25 increases to 
more than 200 MHz. 

In theory, P can be frequency-dependent as well. That 
is, the memory clock frequency can be scaled along with 
the microprocessor's frequency, this to ensure the timely 
delivery of data in the microprocessor registries and caches. 
P in such a case would not be constant over /. Here, it was 
assumed that the microprocessor's clock frequency, once 
set at /opt/ doesn't change over time. Another common ap¬ 
proach to save energy is to have a variable clock frequency 
to minimize OOE slack-time and also energy consumption. 


5 State of the Art 

In the previous sections, it is shown that the energy con¬ 
sumption of a microprocessor shows convex properties with 



( 3 ) foptidiO 



(b) /opt (/5,-Pback) 

Fig. 6. Optimal microprocessor frequency /opt for variable levels of 3 in 
function of on the top, and Pback> on the bottom. The area below 
the horizontal dotted line signals the microprocessor’s default clock 
frequency window (0.2 GHz < / < 1.6 GHz). 


regard to its clock frequency. The convex energy consump¬ 
tion curve has been mentioned before several times in 
the literature. A sensitivity study of the convexity model, 
as presented here, has not been reported before. A series 
of papers, approaching the problem from a chip point of 
view, without the consideration of software, have shown the 
energy consumption with respect to Dynamic Voltage and 
Erequency Scaling (DVFS) |[^|, |[^, |[^|. The literature 

puts forward some motivation for the energy consumption's 
convexity, but rarely provides analytical frameworks based 
on physical explanations. For example, Senn et al. |T^ and 
Austin and Wright pO) provide a heuristic model. Other 
studies, e.g., Hager et al. and Freeh et al. ||^, dis¬ 
cuss what the consequences are of said behavior and how 
to exploit them, from a high-level point of view. Other 
researchers have also shown energy measurements under 
DVFS processes but no convexity is shown by the measure¬ 
ments, e.g., Sinha and Chandrakasan and Simunic et 
al. j^, who are not running their benchmarks on top of 
an Operating System (OS). Authors, such as Austin and 
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Wright and Snowdon (15) ^ p6| ^ have shown more 
specifically that for applications with certain behavioral pat¬ 
terns no energy convexity is observed. However, the energy 
consumption model presented in our work can explain such 
behavior. 

In the Very-Large-Scale Integration (VLSI) design do¬ 
main, voltage scaling has also been discussed but usually 
for a fixed frequency j^, j^, j^. The aim of the volt¬ 
age scaling is to find a minimum energy operation point 
where the digital circuit yields the correct output. The major 
trade-off is between increased circuit latency and leakage 
power, and decreasing dynamic power. This trade-off also 
yields a convex energy consumption curve, but for a fixed 
frequency. In this paper, however, the combined effect of 
voltage/frequency scaling is of interest. 

There are some works that cover the energy/frequency 
convexity properties in a more analytical framework. Fig¬ 
ure shows excerpts of convex energy graphs provided 
by the cited works. Yuki and Rajopadhye |[^ explored the 
particular case of energy consumption of high-performance 
computers in the context of compiler optimization and 
optimal frequency conditions of the microprocessor. One 
of their conclusions is that for power-hungry systems the 
race-to-halt energy optimization technique is more effective 
than DVFS. Hager et al. ||^, on the other hand, showed 
that race-to-halt is not always the most effective strat¬ 
egy in a multi-core context with bandwidth-bound codes. 
The authors studied the energy consumption of modern 
multi-core chips via simple machine models and showed 
how to minimize the energy consumption with respect to 
the number of cores, serial code performance, and clock 
frequency. Austin and Wright p0| examined the energy 
consumption of micro-benchmarks and applications on a 
Cray CX30 super computer system. The authors developed a 
simple linear heuristic energy model. They also stressed that 
the frequency/energy minimum is application-specific. Cho 
and Chang assessed the optimal frequency conditions 
for a microprocessor in conjunction with a memory. Their 
resulting model is fairly complex; yet the authors show the 
feasibility of a microprocessor's optimal frequency condi¬ 
tions in conjecture with a memory system. Cho and Mel- 
helm produced a convex model derived from Amdahl's 
law and extended with the notion of energy. The authors use 
a simplifying assumption for the representation of power 
and execution time. They show via their model that there 
is a certain clock frequency range that yields both energy 
and speed improvements. Similarly, Rizvandi ei al. | [30| 
devised a convex model but, just as Cho and Melhelm, sim¬ 
plified representations of power and execution time were 
assumed. Vasilaki (sTj showed experimental evidence for 
a convex energy curve in relation to the microprocessor's 
clock frequency for almost all individual instructions of the 
ARM Cortex A7. No theoretical framework is provided by 
Vasilaki, however, to backup these findings analytically. 

From an experimental perspective, Halimi et al. 32 
claim to save up to 39% of energy, and Qiu et al. 33 
advertise an energy gain of 25%, by adjusting the micro¬ 
processor's clock frequency via an experimental algorithm 
with predefined user or application constraints. Although 
no theoretical framework was provided by the authors 
about the energy/frequency convexity, their algorithm is 


essentially chasing the convex minimum. Senn et al. 
showed also convex energy/frequency curves, based on a 
simplified system model, for their TI C55, C62, C64, and C67 
platforms. 

Applications of the work presented in this paper fo¬ 
cuses on embedded systems, in contrast with Yuki and 
Rajopadhye's, Hager et al. and Austin and Wright's work, 
which is dedicated to more powerful computer architec¬ 
tures. The sensitivity of the parameters that constitute the 
energy consumption equation are also analyzed via both an 
analytical approach and via experimental data, the former 
fitted with data from the latter. The convex energy model 
presented here is, in contrast with the mentioned works, 
more extensive, which allows for a more realistic modeling. 
For example, temperature has not been a subject of interest 
and a sensitivity analysis of parameters has also not been 
carried out in any of the referenced works. 

6 Conclusion 

In this paper we developed and analyzed the energy con¬ 
sumption equation of a microprocessor operating in a com¬ 
puter system with other components. An analytical anal¬ 
ysis, along with numerical simulation and measurement 
data, was used to study the behavior and sensitivity of 
its parameters. It was shown through an analytical frame¬ 
work, measurements, and literature review that the energy 
consumption curve shows convex properties with regard 
to the clock frequency of the microprocessor. The convex 
energy minimum is the point with a given clock frequency 
/opt where the computer system consumes the minimum 
amount of energy while executing a code sequence. 

The energy saving gained by running at the optimal 
clock frequency is a trade-off with the performance of 
the system, in terms of execution time. For applications 
requiring human interaction, it has been shown by Seeker et 
al. however, that the clock frequency can be scaled down 
considerably without affecting the user's experience. More 
generally, this kind of energy savings can be obtained for 
code sequences where a limited slowdown can be tolerated 
and time is not critical. For example, such slowdowns could 
be applied to code sequences, in multithreaded programs 
that are not on the critical path (34) . 

The existence of the energy/frequency convexity prop¬ 
erty was further confirmed via experimental measurement 
traces of multimedia microprocessors commonly used for 
embedded system applications. The main conclusions of the 
analysis are: 

• Energy/frequency convexity occurs always, but, to 
exploit the convex minimum, /opt should be within 
the exploitable clock frequency window; 

• The background power requirement (Pback) is the 
parameter that influences the optimal frequency the 
most; the larger the background power demands, 
the larger the optimal clock frequency: when Pback 
equals Pcpu/ /opt will be close to the maximum 
microprocessor clock frequency; 

• An application's power profile (0 has a minimal 
effect on the optimal frequency, mostly because the 
variations in power profiles are fairly small in the 
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(a) Fan et al. [i^ 




(c) Hager etal. 



CPU Frequency (MHz) 


((d) Snow(don etal. 




F (MHz) 


(e) Austin an(d Wright |2^ 


(f) Senn etal. 


Fig. 7. Excerpts of energy/frequency measurements as found in the literature. Convex minimums are observable for the energy at a certain 
microprocessor clock frequency, depending on the microprocessor and architecture. In the sequel the behavior of this convex minimum is analyzed. 
All figures were originally published in the papers referenced in their respective captions. 


experiments we ran, an average of 50 MHz in /opt 
between the power profile's extremities; 

• The number of instructions of a code sequence has no 
influence on the optimal clock frequency, following 
the energy consumption model, but does scale the 
energy consumption linearly on the premise that ^ 
has minimal effect at constant temperature; 

• Application concurrency and clock cycle thieves (/k) 
significantly affect the optimal frequency; the less 
clock cycles available to the applications, the larger 
the optimal clock frequency: on average for a 1 GHz 
increase in /k, /opt increases by 2 GHz; 

• Microprocessor slack time {/3), during off-chip op¬ 
erations, forces the optimal clock frequency down: 
300 MHz for 0 < /d < 0.25 in the extreme case; 


• The race-to-halt strategy is justified only when the 
optimal clock frequency is larger than the micropro¬ 
cessor's maximum frequency 


Given that Pback has a large effect on the optimal frequency 
/opt, it was shown that a system with a Pback of the order 
of Pcpu and larger will have a /opt likely outside the reach 
of the microprocessor's clock frequency range. Thus chasing 
the optimal clock frequency /opt is especially beneficial for 
low-power systems, such as for embedded applications, as 
their Pback is much smaller than what would be expected 
for high-performance computer systems. 
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